Search CORE

79 research outputs found

On the Locality of Action Domination in Sequential Decision Making

Author: Lagoudakis Michail G.
Rachelson Emmanuel
Publication venue
Publication date: 01/01/2010
Field of study

In the field of sequential decision making and reinforcement learning, it has been observed that good policies for most problems exhibit a significant amount of structure. In practice, this implies that when a learning agent discovers an action is better than any other in a given state, this action actually happens to also dominate in a certain neighbourhood around that state. This paper presents new results proving that this notion of locality in action domination can be linked to the smoothness of the environment's underlying stochastic model. Namely, we link the Lipschitz continuity of a Markov Decision Process to the Lispchitz continuity of its policies' value functions and introduce the key concept of influence radius to describe the neighbourhood of states where the dominating action is guaranteed to be constant. These ideas are directly exploited into the proposed Localized Policy Iteration (LPI) algorithm, which is an active learning version of Rollout-based Policy Iteration. Preliminary results on the Inverted Pendulum domain demonstrate the viability and the potential of the proposed approach

Open Archive Toulouse Archive Ouverte

Detecting Olives with Synthetic or Real Data? Olive the Above

Author: Aloimonos Yiannis
Karabatis Yianni
Lagoudakis Michail G.
Lin Xiaomin
Sanket Nitin J.
Publication venue
Publication date: 16/08/2023
Field of study

Modern robotics has enabled the advancement in yield estimation for precision agriculture. However, when applied to the olive industry, the high variation of olive colors and their similarity to the background leaf canopy presents a challenge. Labeling several thousands of very dense olive grove images for segmentation is a labor-intensive task. This paper presents a novel approach to detecting olives without the need to manually label data. In this work, we present the world's first olive detection dataset comprised of synthetic and real olive tree images. This is accomplished by generating an auto-labeled photorealistic 3D model of an olive tree. Its geometry is then simplified for lightweight rendering purposes. In addition, experiments are conducted with a mix of synthetically generated and real images, yielding an improvement of up to 66% compared to when only using a small sample of real data. When access to real, human-labeled data is limited, a combination of mostly synthetic data and a small amount of real data can enhance olive detection

arXiv.org e-Print Archive

Rollout Sampling Approximate Policy Iteration

Author: A. Antos
A. Fern
Christos Dimitrakakis
E. Even-Dar
H. O. Wang
M. G. Lagoudakis
Michail G. Lagoudakis
P. Auer
R. A. Howard
R. Sutton
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2008
Field of study

Several researchers have recently investigated the connection between reinforcement learning and classification. We are motivated by proposals of approximate policy iteration schemes without value functions which focus on policy representation using classifiers and address policy learning as a supervised learning problem. This paper proposes variants of an improved policy iteration scheme which addresses the core sampling problem in evaluating a policy through simulation as a multi-armed bandit machine. The resulting algorithm offers comparable performance to the previous algorithm achieved, however, with significantly less computational effort. An order of magnitude improvement is demonstrated experimentally in two standard reinforcement learning domains: inverted pendulum and mountain-car.Comment: 18 pages, 2 figures, to appear in Machine Learning 72(3). Presented at EWRL08, to be presented at ECML 200

arXiv.org e-Print Archive

Infoscience - École polytechnique fédérale de Lausanne

International Migration, Integration and Social Cohesion online publications

Institutional Repository of the Technical University of Crete

Adaptation-Based Programming in Haskell

Author: Alan Fern
Christopher Bishop
Christopher Simpkins
Chung-chieh Shan
D. Marquardt
David Andre
David Andre
H. Robbins
Jervis Pinto
K. Levenberg
Martin Erwig
Michael Littman
Michail Lagoudakis
Olivier Danvy
Paul Ruvolo
Peter Auer
Pieter Abbeel
R. Maclin
Richard Sutton
S. Thompson
T. Lai
T. Schrijvers
Thomas Dietterich
Tim Bauer
Tim Bauer
Umut A. Acar
Publication venue: 'Open Publishing Association'
Publication date: 01/09/2011
Field of study

We present an embedded DSL to support adaptation-based programming (ABP) in Haskell. ABP is an abstract model for defining adaptive values, called adaptives, which adapt in response to some associated feedback. We show how our design choices in Haskell motivate higher-level combinators and constructs and help us derive more complicated compositional adaptives. We also show an important specialization of ABP is in support of reinforcement learning constructs, which optimize adaptive values based on a programmer-specified objective function. This permits ABP users to easily define adaptive values that express uncertainty anywhere in their programs. Over repeated executions, these adaptive values adjust to more efficient ones and enable the user's programs to self optimize. The design of our DSL depends significantly on the use of type classes. We will illustrate, along with presenting our DSL, how the use of type classes can support the gradual evolution of DSLs.Comment: In Proceedings DSL 2011, arXiv:1109.032

arXiv.org e-Print Archive

CiteSeerX

Crossref

Directory of Open Access Journals

Approximate Policy Iteration using Large-Margin Classifiers

Author: Michail G. Lagoudakis
Michail Lagoudakis And
Ronald Parr
Publication venue
Publication date
Field of study

We present an approximate policy iteration algorithm that uses rollouts to estimate the value of each action under a given policy in a subset of states and a classifier to generalize and learn the improved policy over the entire state space. Using a multiclass support vector machine as the classifier, we obtained successful results on the inverted pendulum and the bicycle balancing and riding domains

CiteSeerX

Algorithm Selection using Reinforcement Learning

Author: Michael L. Littman
Michail G. Lagoudakis
Michail Lagoudakis Mgl
Publication venue: Morgan Kaufmann
Publication date
Field of study

Many computational problems can be solved by multiple algorithms, with different algorithms fastest for different problem sizes, input distributions, and hardware characteristics. We consider the problem of algorithm selection: dynamically choose an algorithm to attack an instance of a problem with the goal of minimizing the overall execution time. We formulate the problem as a kind of Markov decision process (MDP), and use ideas from reinforcement learning to solve it. This paper introduces a kind of MDP that models the algorithm selection problem by allowing multiple state transitions. The well known Q-learning algorithm is adapted for this case in a way that combines both Monte-Carlo and Temporal Difference methods. Also, this work uses, and extends in a way to control problems, the Least-Squares Temporal Difference algorithm (LSTD ) of Boyan. The experimental study focuses on the classic problems of order statistic selection and sorting. The encouraging results reveal the potential of applying learning methods to traditional computational problems

CiteSeerX